Search CORE

8 research outputs found

Self-Updating Models with Error Remediation

Author: Doak Justin E.
Ingram Joey B.
Smith Michael R.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 19/05/2020
Field of study

Many environments currently employ machine learning models for data processing and analytics that were built using a limited number of training data points. Once deployed, the models are exposed to significant amounts of previously-unseen data, not all of which is representative of the original, limited training data. However, updating these deployed models can be difficult due to logistical, bandwidth, time, hardware, and/or data sensitivity constraints. We propose a framework, Self-Updating Models with Error Remediation (SUMER), in which a deployed model updates itself as new data becomes available. SUMER uses techniques from semi-supervised learning and noise remediation to iteratively retrain a deployed model using intelligently-chosen predictions from the model as the labels for new training iterations. A key component of SUMER is the notion of error remediation as self-labeled data can be susceptible to the propagation of errors. We investigate the use of SUMER across various data sets and iterations. We find that self-updating models (SUMs) generally perform better than models that do not attempt to self-update when presented with additional previously-unseen data. This performance gap is accentuated in cases where there is only limited amounts of initial training data. We also find that the performance of SUMER is generally better than the performance of SUMs, demonstrating a benefit in applying error remediation. Consequently, SUMER can autonomously enhance the operational capabilities of existing data processing systems by intelligently updating models in dynamic environments.Comment: 17 pages, 13 figures, published in the proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II conference in the SPIE Defense + Commercial Sensing, 2020 symposiu

arXiv.org e-Print Archive

Crossref

Dynamic Analysis of Executables to Detect and Characterize Malware

Author: Aimone James B.
Doak Justin E.
Draelos Timothy J.
Ingram Joe B.
James Conrad D.
Lamb Christopher C.
Smith Michael R.
Publication venue
Publication date: 28/09/2018
Field of study

It is needed to ensure the integrity of systems that process sensitive information and control many aspects of everyday life. We examine the use of machine learning algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored rather than the bytes of an executable. We examine several machine learning techniques for detecting malware including random forests, deep learning techniques, and liquid state machines. The experiments examine the effects of concept drift on each algorithm to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to gain a better understanding about what differentiates the malware samples from the goodware, which can further be used as a forensics tool to understand what the malware (or goodware) was doing to provide directions for investigation and remediation.Comment: 9 pages, 6 Tables, 4 Figure

arXiv.org e-Print Archive

Crossref

Tracking Cyber Adversaries with Adaptive Indicators of Compromise

Author: Aimone James B.
Cox Jonathan A.
Dixon Kevin R.
Doak Justin E.
Follett David R.
Ingram Joe B.
James Conrad D.
Mulder Sam A.
Naegle John H.
Publication venue
Publication date: 20/12/2017
Field of study

A forensics investigation after a breach often uncovers network and host indicators of compromise (IOCs) that can be deployed to sensors to allow early detection of the adversary in the future. Over time, the adversary will change tactics, techniques, and procedures (TTPs), which will also change the data generated. If the IOCs are not kept up-to-date with the adversary's new TTPs, the adversary will no longer be detected once all of the IOCs become invalid. Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular expressions (regexes), up-to-date with a dynamic adversary. Our framework solves the TTK problem in an automated, cyclic fashion to bracket a previously discovered adversary. This tracking is accomplished through a data-driven approach of self-adapting a given model based on its own detection capabilities. In our initial experiments, we found that the true positive rate (TPR) of the adaptive solution degrades much less significantly over time than the naive solution, suggesting that self-updating the model allows the continued detection of positives (i.e., adversaries). The cost for this performance is in the false positive rate (FPR), which increases over time for the adaptive solution, but remains constant for the naive solution. However, the difference in overall detection performance, as measured by the area under the curve (AUC), between the two methods is negligible. This result suggests that self-updating the model over time should be done in practice to continue to detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science & Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas, Nevada, US

arXiv.org e-Print Archive

Crossref

Plant Trait Diversity Buffers Variability in Denitrification Potential over Changes in Season and Soil Conditions

BACKGROUND: Denitrification is an important ecosystem service that removes nitrogen (N) from N-polluted watersheds, buffering soil, stream, and river water quality from excess N by returning N to the atmosphere before it reaches lakes or oceans and leads to eutrophication. The denitrification enzyme activity (DEA) assay is widely used for measuring denitrification potential. Because DEA is a function of enzyme levels in soils, most ecologists studying denitrification have assumed that DEA is less sensitive to ambient levels of nitrate (NO(3)(-)) and soil carbon and thus, less variable over time than field measurements. In addition, plant diversity has been shown to have strong effects on microbial communities and belowground processes and could potentially alter the functional capacity of denitrifiers. Here, we examined three questions: (1) Does DEA vary through the growing season? (2) If so, can we predict DEA variability with environmental variables? (3) Does plant functional diversity affect DEA variability? METHODOLOGY/PRINCIPAL FINDINGS: The study site is a restored wetland in North Carolina, US with native wetland herbs planted in monocultures or mixes of four or eight species. We found that denitrification potentials for soils collected in July 2006 were significantly greater than for soils collected in May and late August 2006 (p<0.0001). Similarly, microbial biomass standardized DEA rates were significantly greater in July than May and August (p<0.0001). Of the soil variables measured--soil moisture, organic matter, total inorganic nitrogen, and microbial biomass--none consistently explained the pattern observed in DEA through time. There was no significant relationship between DEA and plant species richness or functional diversity. However, the seasonal variance in microbial biomass standardized DEA rates was significantly inversely related to plant species functional diversity (p<0.01). CONCLUSIONS/SIGNIFICANCE: These findings suggest that higher plant functional diversity may support a more constant level of DEA through time, buffering the ecosystem from changes in season and soil conditions

Crossref

Directory of Open Access Journals

PubMed Central

DukeSpace

Harnessing the NEON data revolution to advance open environmental science with a diverse and data-capable community

Author: Abdulrahim Mujahid
Adler John
Balch Jennifer K.
Barnes Grenville
Bartowitz Kristina J.
Bissell Erin K.
Blake Rachael E.
Bombaci Sara P.
Brun Julien
Buchanan Jacob D.
Cattau Megan E.
Chadwick K. Dana
Chapman Melissa S.
Chong Steven S.
Chung Y. Anny
Corman Jessica R.
Couret Jannelle
Crispo Erika
Doak Thomas G.
Donnelly Alison
Duffy Katharyn A.
Dunning Kelly H.
Duran Sandra M.
Edmonds Jennifer W.
Fairbanks Dawson E.
Felton Andrew J.
Florian Christopher R.
Gann Daniel
Gebhardt Martha
Gill Nathan S.
Glenn Nancy F.
Gram Wendy K.
Guo Jessica S.
Halpern Benjamin S.
Harvey Brian J.
Hayes Katherine R.
Helmus Matthew R.
Hensley Robert T.
Hondula Kelly L.
Huang Tao
Hundertmark Wiley J.
Iglesias Virginia
Ilangakoon Nayani
Jacinthe Pierre‐Andre
Jansen Lara S.
Jarzyna Marta A.
Johnson Brian
Johnson Tiona M.
Jones Katherine D.
Jones Megan A.
Joseph Maxwell B.
Just Michael G.
Kaddoura Youssef O.
Kagawa‐Vivani Aurora K.
Kaushik Aleya
Keller Adrienne B.
King Katelyn B. S.
Kitzes Justin
Koontz Michael J.
Kouba Paige V.
Kwan Wai‐Yin
LaMontagne Jalene M.
LaRue Elizabeth A.
Li Bonan
Li Daijiang
Lin Yang
Liptzin Daniel
Long William Alex
Mahood Adam L.
Malloy Samuel S.
Malone Sparkle L.
Marconi Sergio
McGlinchy Joseph M.
Meier Courtney L.
Melbourne Brett A.
Mietkiewicz Nathan
Morisette Jeffery T.
Moustapha Moussa
Muscarella Chance
Musinsky John
Muthukrishnan Ranjan
Nagy R. Chelsea
Naithani Kusum
Neely Merrie
Norman Kari
O’Riordan Catherine
Parker Stephanie M.
Perez Rocha Mariana
Petri Laís
Ramey Colette A.
Record Sydne
Rossi Matthew W.
SanClements Michael
Sanovia James
Scholl Victoria M.
Schweiger Anna K.
Seyednasrollah Bijan
Sihi Debjani
Smith Kathleen R.
Sokol Eric R.
Spaulding Sarah A.
Spiers Anna I.
St. Denis Lise A.
Staccone Anika P.
Stack Whitney Kaitlin
Stanitski Diane M.
Stricker Eva
Surasinghe Thilina D.
Swetnam Tyson L.
Thomsen Sarah K.
Travis William R.
Vasek Patrisse M.
Wasser Leah A.
Woolner Elizabeth
Xiaolu Li
Yang Di
Yu Rong
Yule Kelsey M.
Zarnetske Phoebe
Zhu Kai
Publication venue: 'Wiley'
Publication date: 01/12/2021
Field of study

It is a critical time to reflect on the National Ecological Observatory Network (NEON) science to date as well as envision what research can be done right now with NEON (and other) data and what training is needed to enable a diverse user community. NEON became fully operational in May 2019 and has pivoted from planning and construction to operation and maintenance. In this overview, the history of and foundational thinking around NEON are discussed. A framework of open science is described with a discussion of how NEON can be situated as part of a larger data constellation—across existing networks and different suites of ecological measurements and sensors. Next, a synthesis of early NEON science, based on >100 existing publications, funded proposal efforts, and emergent science at the very first NEON Science Summit (hosted by Earth Lab at the University of Colorado Boulder in October 2019) is provided. Key questions that the ecology community will address with NEON data in the next 10 yr are outlined, from understanding drivers of biodiversity across spatial and temporal scales to defining complex feedback mechanisms in human–environmental systems. Last, the essential elements needed to engage and support a diverse and inclusive NEON user community are highlighted: training resources and tools that are openly available, funding for broad community engagement initiatives, and a mechanism to share and advertise those opportunities. NEON users require both the skills to work with NEON data and the ecological or environmental science domain knowledge to understand and interpret them. This paper synthesizes early directions in the community’s use of NEON data, and opportunities for the next 10 yr of NEON operations in emergent science themes, open science best practices, education and training, and community building

IUPUIScholarWorks

eScholarship - University of California

Recommended from our members

Dynamic defense workshop : from research to practice.

Author: Crosby Sean Michael
Doak Justin E.
Haas Jason J.
Helinski Ryan
Lamb Christopher C.
Publication venue: Sandia National Laboratories
Publication date: 01/02/2013
Field of study

On September 5th and 6th, 2012, the Dynamic Defense Workshop: From Research to Practice brought together researchers from academia, industry, and Sandia with the goals of increasing collaboration between Sandia National Laboratories and external organizations, de ning and un- derstanding dynamic, or moving target, defense concepts and directions, and gaining a greater understanding of the state of the art for dynamic defense. Through the workshop, we broadened and re ned our de nition and understanding, identi ed new approaches to inherent challenges, and de ned principles of dynamic defense. Half of the workshop was devoted to presentations of current state-of-the-art work. Presentation topics included areas such as the failure of current defenses, threats, techniques, goals of dynamic defense, theory, foundations of dynamic defense, future directions and open research questions related to dynamic defense. The remainder of the workshop was discussion, which was broken down into sessions on de ning challenges, applications to host or mobile environments, applications to enterprise network environments, exploring research and operational taxonomies, and determining how to apply scienti c rigor to and investigating the eld of dynamic defense

UNT Digital Library

Recommended from our members

Assessing confidence in phylogenetic trees : bootstrap versus Markov chain Monte Carlo

Author: Burr Tom
Doak J. E. (Justin E.)
Gattiker J. R. (James R.)
Stanbro W. D. (William D.)
Publication venue: Los Alamos National Laboratory
Publication date
Field of study

Recent implementations of Bayesian approaches are one of the largest advances in phylogenetic tree estimation in the last 10 years. Markov chain Monte Carlo (MCMC) is used in these new approaches to estimate the Bayesian posterior probability for each tree topology of interest. Our goal is to assess the confidence in the estimated tree (particularly in whether prespecified groups are monophyletic) using MCMC and to compare the Bayesian estimate of confidence to a bootstrap-based estimate of confidence. We compare the Bayesian posterior probability to the bootstrap probability for specified groups in two real sets of influenza sequences and two sets of simulated sequences for our comparison. We conclude that the bootstrap estimate is adequate compared to the MCMC estimate except perhaps if the number of DNA sites is small

UNT Digital Library

Characterizing Species Interactions to Understand Press Perturbations: What Is the Community Matrix?

Author: Doak Daniel F.
Emmerson Mark
Estes James A.
Jacob Ute
Noble Andrew E.
Novak Mark
Tinker M. Timothy
Wootton J. Timothy
Yeakel Justin D.
Publication venue: 'Annual Reviews'
Publication date: 01/11/2016
Field of study

Queen's University Belfast Research Portal